Method and Apparatus for Analysing Data Representing Attributes of Physical Entities

ABSTRACT

Analysis of electronic data which comprises, for each of a set of physical entities, attribute values representing attributes of the respective physical entity and an outcome value representing an observed outcome for the entity which may be used to generate a model for predicting the outcome value for another physical entity of the same type. The data is processed using a statistical modelling method to generate a model based on the data. The method then involves calculating a case deleted estimate of the outcome value for each of the set of physical entities using the processor; calculating a measure of the deviance of the case deleted estimates from the actual outcome values in the input data; and outputting the calculated deviance measure to the data storage for retrieval by a user.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.13/468,838, filed May 10, 2012, which is a continuation under 35 U.S.C.§120 of International Application No. PCT/GB2011/052296, filed Nov. 23,2011, and claims priority under 35 U.S.C. §119(a) to Great BritainApplication No. 1020091.3, filed Nov. 26, 2010, the entire content ofeach of which is hereby fully incorporated herein by reference.

FIELD OF THE INVENTION

The present invention relates to analysis of electronic data whichcomprises, for each of a set of physical entities, attribute valuesrepresenting attributes of the respective physical entity and an outcomevalue representing an observed outcome for the respective physicalentity. Such analysis is widely used to generate a model for predictingthe outcome value (that is, the most likely outcome or value of a chosenmetric) for another physical entity of the same type.

BACKGROUND OF THE INVENTION

1.1 Current Statistical Techniques

1.1.1 Current modelling techniques use the generalised linear modellingframework to estimate parameters (that is, coefficients) for a givenmodel structure based upon calculating the minimum deviance (maximumlikelihood) estimates for the parameters from a given dataset.

1.1.2 First the structure of the model to be produced (both appropriatelink function and distribution) is established using an understanding ofthe data, and then considering residual plots and the results of theTweedie distribution test (Box-Cox transformation).

1.1.3 The significance of parameter estimates can then be judgedaccording to the standard errors calculated from the information matrix,and various statistical tests for example the Chi Squared and F-Testscan be used to compare two competing models.

1.1.4 A range of other statistics such as the Akaike InformationCriteria “AIC”, and Bayesian Information Criteria “BIC” can also beconsidered.

These statistical approaches were originally utilised in the context ofrelative few factors and levels and relatively few interactions. Therange of factors, number of levels within factors and the number ofinteractions has increased significantly in UK personal lines insuranceas insurers have sought competitive advantage and more recently toprevent anti-selection on price comparison sites (“Winner's Curse”).

1.2 Short-comings of Statistical Techniques

1.2.1 As discussed, over time the size of modelling datasets hasincreased (datasets up to 100 m rows are becoming more common), and thishas highlighted the differences between academic methods designed for afew thousand rows and actual insurance specific models deployed todetermine prices.

1.2.2 In particular the Degrees of Freedom are defined by the number ofrows of data—number of parameters (unaliased). This becomes effectivelyconstant where the dataset is large, and the parameter list rarelyexceeds 1000.

1.2.3 The Deviance for a model decreases as new parameters are added.Hence the ratio of residual deviance to number of degrees of freedomalways improves when the degrees of freedom is effectively constant.This causes Chi Squared tests on nested models and F-Tests to acceptparameters which would be rejected from a business perspective asspurious and over-parameterised.

1.2.4 The likelihood of overfitting clearly increases the moreparameters that are added to the model, be it factors, levels orinteractions.

1.3 Current Business Techniques

1.3.1 There is wide recognition that modelling is not a pure science andbetter results can be achieved using domain knowledge (by applying some“art”). The statistical techniques are usually supported by checking themodels against business understanding of the factors, their usualsignificance and trends from past time periods and other datasets.

1.3.2 Time consistency testing is used to ensure that a factor shape isboth consistently present for given time periods, and to establish ifthere is trend for the shape to strengthen, weaken or change shape overtime. This is essential if the chosen values of the parameters are to bepredictive for a future time period, which is normally the businessobjective.

1.3.3 To mitigate the problems outlined in 1.2, extensive use is alsooften made of hold-out sample data. This is where a model is built on asample of the data, say 80% (modelling or training data) and then theperformance is judged by comparing results scored against the remaining20% (hold-out sample). This approach often, though, only providesapparent comfort as it is not clear what a good or bad “fit” looks likewhen judged on the hold-out sample. An in-time hold-out sample will havethe same mix of business and most adequate models will look to fit wellwhen applied to a hold-out sample. Also when you detect a poor “fit” itwill not necessarily be clear how to correct for any overfitting.

1.3.4 There is also the problem of the range of observations that anymodelling data contains, be it due to the underwriting footprintstrategy or the particular channel business is distributed in.Underlying factor effects such as interactions may not be identified dueto the lack of observed data. Here business knowledge is deployed, oftenin the form of underwriting overlays to statistical models beforeapplying the models to the market.

1.4 Price Comparison Websites, Efficient Market, Winner's Curse

1.4.1 In recent years the rise of price comparison websites,particularly in the UK motor insurance market, has created a nearperfect market for consumers. Coupled with the fact that many view motorinsurance as a commodity product, this has resulted in observed newbusiness elasticities ranging in magnitude from 10 to 100.

1.4.2 The estimates from a pricing model are best estimates in thestatistical sense and hence are subject to uncertainty. In thesecircumstances the Winner's Curse operates as a powerful anti-selectioneffect which imposes a heavy penalty where the uncertainty randomlyresults in an estimate which is below the true value.

1.4.3 In this business context insurers have responded by increasing therange of factors, levels within factors and number of interactions asthey have tried to minimise the level of anti-selection. But in doing sothere is an increased likelihood of overfitting which does present areal business dilemma. When presented with a new factor to implementwhich makes sense from a business viewpoint and is significant, the viewwill, more often than not, be to introduce the factor. In fact it isvery likely that when one systematically reviews the inclusion of eachterm in a sophisticated model that a business sense argument can be madefor each and every one, but it is likely when taking together there willbe an element of overfitting.

1.4.4 In addition to the parameter estimates, the modelling processmakes available results which reveal the uncertainty attached to theseexpressed as a Variance/Covariance matrix. Also the Hat matrix whichdisplays the influence that each data point has had on its correspondingestimate.

1.4.5 There are a number of elements which influence how thisuncertainty varies from model to model, and by risk within the model.Two elements of this uncertainty are discussed below.

1.4.6 The first is the tendency for over-parameterised models toreplicate noise within the data which will not be repeated in futureobservations. This noise is one source of estimate uncertainty.

1.4.7 The second is the tendency for models to be used over aheterogeneous domain. Some areas of the domain are well populated andhence estimates are subject to less uncertainty. The fringes of thedomain which tend to be sparsely populated with observations resultingin greater levels of uncertainty. Extrapolation to future time periodsis a special case of this which necessary for the deployment of mostpredictive models.

SUMMARY OF THE INVENTION

The present invention provides a method for analysing input electronicdata using an electronic processor, wherein the input data comprises,for each of a set of physical entities, attribute values representingattributes of the respective physical entity and an outcome valuerepresenting an observed outcome for the respective physical entity, theanalysis generates a model for predicting the outcome value for afurther physical entity on the basis of data comprising the attributevalues associated with the further physical entity, and the methodcomprises the steps of:

(a) receiving the input data via an input of the processor and storingit in electronic data storage;

(b) retrieving the input data from the data storage and processing theinput data with the processor using a statistical modelling method togenerate a model based on the input data;

(c) calculating a case deleted estimate of the outcome value for each ofthe set of physical entities using the processor;

(d) calculating a measure of the deviance of the case deleted estimatesfrom the actual outcome values in the input data; and

(e) outputting the calculated deviance measure to the data storage forretrieval by a user.

This measure may enable a user to refine a model in a more accuratelypredictive manner. The present methods provide an adjustment to theresults which make them more predictive of future outcomes by providinginsulation from noise in the input data.

The model may be used to predict an outcome value which may for examplerepresent the likelihood of an event occurring in the case of thefurther physical entity. The model information may assist the managementand planning of resources, for example.

The method may include a step after step (e) of:

-   -   calculating the number and location of knots to include in the        model to minimise the deviance measure.

In a preferred implementation, the method includes the steps after step(e) of:

-   -   identifying at least one attribute to omit from the model on the        basis of the associated deviance measure; and    -   removing that attribute from the model.

The invention further provides a method for analysing input electronicdata using an electronic processor, wherein the input data comprises,for each of a set of physical entities, attribute values representingattributes of the respective physical entity and an outcome valuerepresenting an observed outcome for the respective physical entity, theanalysis generates a model for predicting the outcome value for afurther physical entity on the basis of data comprising the attributevalues associated with the further physical entity, and the methodcomprises the steps of:

(a) receiving the input data via an input of the processor and storingit in electronic data storage;

(b) retrieving the input data from the data storage and processing theinput data with the processor using a statistical modelling method togenerate an intermediate model based on the input data, the intermediatemodel comprising parameter estimates and a variance/covariance matrix;

(c) calculating a case deleted estimate of the outcome value for each ofthe set of physical entities on the basis of the intermediate modelusing the processor; and

(d) generating a noise reduced model comprising noise reducedparameters, a noise reduced variance/covariance matrix, and noisereduced case deleted estimates using an iterative process so as tominimise a measure of the deviance of the noise reduced case deletedestimates from the actual outcome values in the input data.

Accordingly, the estimates produced by the intermediate model areadjusted to make them more predictive. The outputs of the intermediatemodel are tempered by penalising uncertain parameters to the extent thatthey are only rewarded for improving the likelihood (reducing Deviance)of the estimates as measured against hold-out sample data.

In a preferred embodiment of this method in step (d) the parametersβ_(j) are replaced in the noise reduced model by noise reducedparameters β*_(j), with the noise reduced variances

${{{Var}\left( \beta_{j}^{*} \right)} = {\left( \frac{\beta_{j}^{*}}{\beta_{j}} \right)^{2}{{Var}\left( \beta_{j} \right)}}},$

the noise reduced covariances

${{{Cov}\left( {\beta_{j}^{*},\beta_{k}^{*}} \right)} = {\left( {\frac{\beta_{j}^{*}}{\beta_{j}}\frac{\beta_{k}^{*}}{\beta_{k}}} \right){{Cov}\left( {\beta_{j},\beta_{k}} \right)}}},$

and the noise reduced case deleted linear predictors

$\eta_{(i)}^{*} = {\eta_{i}^{*} - {\left( \frac{h_{i}^{*}}{1 - h_{i}^{*}} \right){g^{\prime}\left( \mu_{i} \right)}\left( {y_{i} - \mu_{i}} \right)\mspace{14mu} {where}}}$$h_{i}^{*} = {\sum\limits_{jk}{\frac{X_{ij}\beta_{j}^{*}C_{jk}X_{ik}\beta_{k}^{*}W_{i}}{\beta_{j}\beta_{k}}.}}$

The method may include a step after step (e) of:

-   -   calculating the number and location of knots to include in the        noise reduced model to minimise the deviance measure.

Furthermore, the method may include the steps after step (e) of:

-   -   identifying at least one attribute to omit from the noise        reduced model on the basis of a measure of the deviance of the        attribute relative to the noise reduced model; and    -   removing that attribute from the noise reduced model.

Calculation step (c) of the present methods preferably comprisescalculating the case deleted estimate directly for each entity, withoutrunning the intermediate model for each entity on the basis of the inputdata with the data associated with that entity omitted.

Linear predictors and estimates are related by the link function inA.1.6, namely η_(i)=g(μ_(i)) The case deleted version is similar:η_((i))=g(μ_((i))).

For all the link functions there is a simple inverse function for g( )so that if you calculate the linear predictor, you can then get theestimate. The log( ) link function is used in some examples, exp( )being the inverse.

The preferred method involves calculating the case deleted linearpredictors by adjusting the linear predictor provided by theintermediate model, by subtracting an amount equal to the influence onthe model caused by the respective datapoint. This influence isdescribed by the distance from the model to the datapoint (y_(i)−μ_(i)),times the influence

$\left( \frac{h_{i}}{1 - h_{i}} \right),$

times the rate of change of the linear predictor by the estimate,

$\frac{\partial\eta_{i}}{\partial\mu_{i}} = {{g^{\prime}\left( \mu_{i} \right)}.}$

Hence, calculation step (c) may comprise calculating case deleted linearpredictors η_((i)) such that:

$\eta_{(i)} = {{\eta_{i} - {\left( \frac{h_{i}}{1 - h_{i}} \right){g^{\prime}\left( \mu_{i} \right)}\left( {y_{i} - \mu_{i}} \right)\mspace{14mu} {where}\mspace{14mu} h_{i}}} = {\sum\limits_{jk}{X_{ij}C_{jk}X_{ik}W_{i}}}}$${{and}\mspace{14mu} \frac{\partial\eta_{i}}{\partial\mu_{i}}} = {{g^{\prime}\left( \mu_{i} \right)}.}$

In further implementations, as noted above, calculation step (c) maycomprise calculating a case deleted estimate for each entity by runningthe intermediate model on the basis of the input data with the attributevalues associated with that entity omitted to generate a respective setof case deleted model parameters.

The case deleted estimate may be calculated by taking the intermediatemodel (on the full dataset), extracting the one datapoint in question(or applying a zero weight to it's importance), and refitting. Giventhat the intermediate model is available as a starting point, thisprocess only involves an iteration or two as described in A.1.10 below.

However, when there are many datapoints (possibly 100 million), even asingle iteration here for each is laborious.

The preferred method is to calculate approximations of the case deletedlinear predictors (as discussed above), and use the inverse linkfunction to get a case deleted estimate without refitting theintermediate model. Computationally this is probably a million times(for example) more efficient than fitting the model 100 million times.

The statistical modelling method used to generate a model on the inputdata may generate a Generalised Linear Model or a Generalised Non-linearModel, for example.

The true mechanism for scaling back parameters described herein providesa further benefit. Simply allowing a model to become over-parameterisedas it is developed, and reporting parameter errors which state they arenot significant is not enough. With this method we can go further, andscale back the poor parameters effectively neutralising them from themodel. Pruning processes may then operate to remove them altogether.This will allow the user to focus upon finding potential factors in theknowledge that unsuccessful attempts will not damage the output.

A company may choose to build the present modelling techniques on top ofa market rates model, so that rather than scaling back towards the mean,parameters are scaled back towards market rates instead. Therefore acompany would only differ from market structures where it had sufficientdata to confirm a significant difference in experience.

BRIEF DESCRIPTION OF THE DRAWINGS

Prior art techniques and embodiments of the invention will now bedescribed by way of example and with reference to the accompanyingdrawings wherein:

FIG. 1 is a diagram illustrating the effect of case deletion;

FIG. 2 is a plot of standard error against parameter value for anaccident damage frequency data set;

FIGS. 3 and 4 are plots representing the accuracy of case deletedestimates generated for a Log-Poisson and a Logit-Binomial model,respectively;

FIGS. 5 to 8 are plots of models generated for different factors fordifferent knot positions;

FIG. 9 is a diagram illustrating case deletion;

FIG. 10 is a plot illustrating the decrease in value of a model overtime;

FIGS. 11 to 17 show plots relating to models generated using methodsembodying the present invention and sample data sets;

FIG. 18 illustrates an example embodiment of a method for analysinginput data representing attributes of physical entities;

FIG. 19 illustrates another example embodiment of a method for analysinginput data representing attributes of physical entities; and

FIG. 20 illustrates an example hardware circuit diagram of a generalpurpose computing device according to certain aspects described herein.

DETAILED DESCRIPTION

2. Case Deletion

2.1 Elimination of Outliers by Case Deletion

2.1.1 Using a measure of residuals such as the Cook's Statistic, Outlierpoints can be excluded from the model based on their undue influence onthe parameter estimates.

2.1.2 This technique is supported by leading statistical packages, butfor datasets of the scale currently in use, deleting outliers is anonerous and unproductive task.

2.1.3 In essence each data point acts to pull the model towards itself,and the exclusion of that point and refitting the parameters will resultin a new set of parameter values and hence new “Case Deleted” Estimatefor that data point. By definition that estimate will lie further fromthe observed data point than the estimate produced by the full model.This is illustrated in FIG. 1, where y_(i) are the original datapoints,μ_(i) are the estimates of those datapoints generated initially, andμ_((i)) are the “Case Deleted” Estimates.

3. Calculation of “Case Deleted” Estimates

3.1 Formula for Case Deleted Parameters

3.1.1 McCullagh & Nelder suggest two methods for calculating “CaseDeleted” parameters.

3.1.2 McCullagh & Nelder p396 discusses the idea of Case Deletion in thestandard sense, as a means to identify whether to exclude individualoutlier points from an analysis. They talk about the impact on the modelfit of removing the point. Also that this is slow if the model needs tobe refitted, and suggest that a first step approximation is used. Forour purposes even if a single iteration was accurate enough a set of“Case Deleted” Parameters is still required for every data point whichas noted in Berry would be impractically slow.

3.1.3 On p406 they quote a result from Atkinson for the linear case

${{\hat{\beta}}_{(j)} - {\hat{\beta}}_{j}} = \frac{{- \left( {X^{T}{WX}} \right)^{- 1}}{x_{i}\left( {y_{i} - \mu_{i}} \right)}}{\left( {1 - h_{i}} \right)}$

where the Hat diagonal is defined as

h _(i)=diag_(i)(W ^(1/2) X(X ^(T) WX)⁻¹ X ^(T) W ^(1/2))

and suggest a modification for the generalized linear case

${{\hat{\beta}}_{(j)} - {\hat{\beta}}_{j}} = \frac{{- \left( {X^{T}{WX}} \right)^{- 1}}{x_{i}\left( {z_{i} - \eta_{i}} \right)}}{\left( {1 - h_{i}} \right)}$where z_(i) = g(y_(i))

3.1.4 For the linear case this can be used to generate the estimatedirectly

$\eta_{(i)} = {\eta_{i} - {\left( \frac{h_{i}}{1 - h_{i}} \right)\frac{\left( {y_{i} - \mu_{i}} \right)}{W_{i}}}}$with  the$h_{i} = {\sum\limits_{jk}\; {X_{ij}C_{jk}X_{ik}W_{i}}}$

This formula and the one suggested in 3.1.3 have been found to beincorrect for the linear and generalised linear models, and a new one isproposed in section 5.2.3.

4. “Case Deleted” Deviance

4.1 Calculating of “Case Deleted” Deviance

4.1.1 Taking the “Case Deleted” Estimate provides a means to calculate anew “Case Deleted” Deviance. This is in effect the limiting case ofcalculating the deviance for a hold-out sample of one row against amodel based on “n−1” rows, as the new estimate is not influenced by theobserved value itself.

4.1.2 The Standard Deviance is a measure of the distance from theobserved values to the estimate. In the extreme case where the modelcontains a parameter for every data point, the estimates and theobserved values will be equal and the deviance will have a minimumvalue. The model here is replicating both the Pattern in the data andthe Noise.

4.1.3 Because the “Case Deleted” Deviance is calculated from estimateswhich are independent of the observed values it represents the patternbut without the noise related to the observed data point in question. Anextreme model will still include noise generated by the other datapoints, but provided the data points are independent this should averageto zero.

4.1.4 A number of practical tests have been conducted comparing theStandard and “Case Deleted” Deviances. From these it is helpful todefine some terms. Let SD₁, SD₂ be the Standard Deviances from a basemodel and an adjusted model. If the adjusted model is created by addingparameters to the base model, then we know that SD₁>SD₂. Similarly takeCDD₁, CDD₂ to be the “Case Deleted” equivalents. Interestingly it ispossible for CDD₂ to be larger than CDD₁ in circumstances where theextra parameters are adding more Noise to the model than Pattern.

4.1.5 Defining Pattern_(1,2)=CDD₁−CDD₂ andNoise_(1,2)=SD₁−SD₂−Pattern_(1.2)

The value of these measures is considered below and compared to existingtests.

4.2 Correlation with Standard Errors

4.2.1 The first example involved a Log-Poisson model with an AccidentDamage Frequency dataset containing lm rows, with around 200 parameterscovering a range of factors.

4.2.2 For each parameter in the model a new sub-model was created withthat single parameter deleted. Then the Noise and Pattern measures werecalculated between the full model and the sub-model.

4.2.3 Standard Errors over 50% are generally considered to be poor asthese correspond to the parameter value of two standard errors, whichequates to the 95% significance level of a normal distribution test.

The two tests showed a strong correlation. DefiningValue_(1,2)=Pattern_(1,2)−5*Noise_(1,2) shows a positive value when theStandard Error is less than 50% and negative above, as FIG. 2demonstrates. While 5 appears a sensible value to choose in thisexample, adjustment may be required for other model structures anddatasets.

5. Calculation of “Case Deleted” Estimates

5.1 Formula for Generalized Linear “Case Deleted” Estimates

5.1.1 This same formula was also tested in the generalized linear caseof a Log-Poisson model and found to be 99.8% accurate, albeit with aslight bias. FIG. 3 shows

${- \left( \frac{h_{i}}{1 - h_{i}} \right)}\frac{\left( {y_{i} - \mu_{i}} \right)}{W_{i}}$

on the x-axis and

$\left( {\eta_{(i)} - \eta_{i}} \right)/\left( {{- \left( \frac{h_{i}}{1 - h_{i}} \right)}\frac{\left( {y_{i} - \mu_{i}} \right)}{W_{i}}} \right)$

on the y-axis where the actual η_((i)) have been calculated with a fullmodel fit per data point.

5.1.2 Likewise the second formula

${{\hat{\beta}}_{(j)} - {\hat{\beta}}_{j}} = \frac{{- \left( {X^{T}{WX}} \right)^{- 1}}{x_{i}\left( {z_{i} - \eta_{i}} \right)}}{\left( {1 - h_{i}} \right)}$

was also tested and rejected.

5.1.3 This method has also been checked on a Logit-Binomial model,giving the results shown in FIG. 4.

5.1.4 Armed with this new method we now have the ability to generateη_((i)) directly from a single model fit.

5.2 Bayesian Understanding of the “Case Deleted” Estimates

5.2.1 The Hat matrix provides the influence of each data point on theparameters. The total of each row adding to one, and hence can bethought of as a credibility in a Bayesian context.

5.2.2 For a linear model the estimate will be formed as follows:

$\eta_{i} = {\sum\limits_{p}\; {h_{p}{y_{p}.}}}$

This can be rearranged as follows

$\eta_{i} = {{h_{i}y_{i}} + {\sum\limits_{p \neq i}\; {h_{p}y_{p}}}}$

then observing that η_((i)) is the equivalent developed from one lessdata point

${\eta_{(i)} = {\frac{\sum\limits_{p \neq i}\; {h_{p}y_{p}}}{\sum\limits_{p \neq i}\; h_{p}} = \frac{\sum\limits_{p \neq i}\; {h_{p}y_{p}}}{\left( {1 - h_{i}} \right)}}},{\eta_{i} = {{h_{i}y_{i}} + {\left( {1 - h_{i}} \right)\eta_{(i)}}}}$so$\eta_{(i)} = {\frac{\eta_{i} - {h_{i}y_{i}}}{1 - h_{i}} = {\eta_{i} - {\left( \frac{h_{i}}{1 - h_{i}} \right)\left( {y_{i} - \eta_{i}} \right)}}}$

5.2.3 For the Generalized Linear Model a first order approximation wouldbe

$\eta_{(i)} = {{\eta_{i} - {\left( \frac{h_{i}}{1 - h_{i}} \right)\frac{\partial\eta_{i}}{\partial\mu_{i}}\left( {y_{i} - \mu_{i}} \right)}} = {\eta_{i} - {\left( \frac{h_{i}}{1 - h_{i}} \right){g^{\prime}\left( \mu_{i} \right)}\left( {y_{i} - \mu_{i}} \right)}}}$

For Log-Poisson and Logit-Binomial models g′(μ_(i))V(μ_(i))=1, giving

${3.1{.4}\mspace{11mu} \eta_{(i)}} = {\eta_{i} - {\left( \frac{h_{i}}{1 - h_{i}} \right)\frac{\left( {y_{i} - \mu_{i}} \right)}{W_{i}}}}$

for unit weights.

5.2.4 We undertook a numerical checking of a Log-Gamma model as for thisstructure

${W_{i}{g^{\prime}\left( \mu_{i} \right)}} = {\frac{\omega_{i}}{\phi \; \mu_{i}}.}$

For this model we found that

$\eta_{(i)} = {\eta_{i} - {\left( \frac{h_{i}}{1 - h_{i}} \right){g^{\prime}\left( \mu_{i} \right)}\left( {y_{i} - \mu_{i}} \right)}}$

and hence reject 3.1.4.

6. Applications to Model Comparison

6.1 Optimal Knot Position Application for Factor Splines

6.1.1 This example is a case study using the same dataset as 4.2.1, andusing the “Value” measure from 4.2.3 above.

6.1.2 For a number of factors Policyholder Age, Vehicle Group, RatingArea, NCD, Convictions, Number of Years Licence Held. SimpleX-Values=Level Number were defined.

6.1.3 Then a knot was added to a spline defined with these X-Values, andthen looping through each integer position for the knot, calculating the“Value” measure. The best position for the knot was selected and thenthe process repeated adding another knot. This was continued until anextra knot reduces “Value”.

6.1.4 This is not an efficient process since the smooth results obtainedindicate the maximum “Value” position could be obtained with fewersteps. However the result graphs are more complete if every position iscalculated for the result charts referred to below.

6.1.5 Once an extra knot has been added, this method did not recheckthat the existing ones should remain in their current positions. Anefficient implementation would first derive the number of knotsrequired, then find their approximate positions, and finally jiggle themto find the global optimum.

6.1.6 Although the process of calculating the additional “Noise” wasperformed at each step, this turned out to be quite stable, hence theknot position could be estimated from the unadjusted deviance alone. The“Noise” adjustment is only needed to define the absolute “Value” ofadding an extra knot.

6.1.7 Policyholder Age

6.1.8 This factor suggests two knots at ages 17, 49, and rejects a thirdone at 53 (see FIG. 5).

6.1.9 Standard Error values would accept these two knots and reject thethird, as would F-Tests

6.1.10 Vehicle Group

6.1.11 This factor suggests two knots at 5 and 19 (see FIG. 6).

6.1.12 The Standard Error test is confusing here, it would accept knot 5from a spline containing (5, 19), would reject all parameters fromsplines (5,19, 2) and (5,19, 2, 18) and would accept knots 5, 2 from thefive knot spline (5, 19, 2, 18, 6).

6.1.13 The F-Test considers the splines (5) similar to (5, 19), (5, 19,2) and (5, 19, 2, 14), but claims the spline (5, 19, 2, 14, 6) isdifferent from (5, 19, 2, 14) yet similar to (5, 19, 2).

6.1.14 Hence the “Value” measure appears useful as a global absolutestatistic. The standard error only describes the certainty of anindividual parameter, and becomes difficult when the SE values ofseveral parameters vary from one model to the next. The F-Test onlydescribes if two models are significantly different, not if one isbetter than the other.

6.1.15 Rating Area

6.1.16 This factor gives two knots at 1 and 29 (see FIG. 7). Inagreement with SE and F-Tests.

6.1.17 NCD

6.1.18 This factor gives one knot at 4 (see FIG. 8). In agreement withSE and F-Tests.

7. Noise Reduced Parameters

7.1 Desire for an amended set of parameters

7.1.1 The realisation that the “Case Deleted” Estimate μ_((i)) is auseful noise independent measure, and readily calculated, led to acouple of initial attempts to use it directly to influence the modeloutput estimates.

7.2 First and Second Attempts, Mean Adjustors

7.2.1 The first thought was that the noise in the model output could bereduced by artificially offsetting each data point to remove anequivalent amount, y*_(i)=y_(i)+μ_((i))−μ_(i). These can then berefitted to obtain a new set of estimates, μ*_(i).

7.2.2 The second attempt applied a second tier model to the “CaseDeleted” Estimates from the first y*_(i)=μ_((i)) to try to produce somenew estimates μ*_(i) with less noise.

7.2.3 Neither of these produces results which are significantlydifferent from the original estimates. This can be understood byreflecting on the way that GLM models select their parameters by placingthem at the “mean” position of the sub-domain for each parameter. Hencethe data has a symmetry about this mean, and the noise μ_(i)=μ_((i))reflects this too. So both methods above represent symmetricaladjustments to the data which have little effect on the new estimates.

7.2.4 Consider the example illustrated in FIG. 9. Here we have a wellpopulated domain with data points on the left defining a value of μ_(i)shown as the lower dashed line. Then a new parameter based solely upontwo data points y₁, y₂ is considered, this will move the ordinaryestimates to the mid-point of the two points shown as μ*_(i) the upperdashed line. With this parameter included the Case Deleted model for y₁will produce μ*₍₁₎=y₂ and similarly the Case Deleted model for y₂ willproduce μ*₍₂₎=y₁. The Deviances calculated for this parameter will showSD(y_(i),μ_(i))>SD*(y_(i),μ*_(i)) but here the “Case Deleted” Deviancewill be substantially worse CDD*(y_(i),μ*_((i)))>>SD(y_(i),μ_(i)). Thesymmetry of the adjustments can be seen easily here, and hence despitethe failure of the extra parameter to add value, we can see why itsvalue remains unchanged.

7.3 The Need for a Variance Penalty Function to Drive the Adjustor

7.3.1 Looking again at the formulation of the “Case Deleted” Estimatesμ_((i)), notice that they involve terms representing the mean μ_(i) andthrough the Hat diagonal h_(i) the variance. Instead therefore we needto develop a penalty function to reward the model for good mean valuesand penalise by increasing variance.

7.3.2 However we cannot simply replace μ_(i) with μ_((i)) in thelikelihood and refit, since the extra deviance introduced alreadypossesses the symmetry above, and hence there is little impact on theparameters values by the method.

7.3.3 Now let's focus instead on a more direct penalty function. Takethe results of the free fit μ_(i) with corresponding μ_((i)). Nowconsider that the variance introduced by a parameter, as expressed bythe Variance/Covariance matrix will be scaled if the parameter itself isartificially scaled. Specifically the impact on the covariances willallow the model to rebalance in the presence of correlated parameters.

7.3.4 The Variance/Covariance matrix itself will adjust simply accordingto the normal result for scaled variances. Var(λY_(i))=λ²Var(Y_(i)). Inthis case the elements of the Variance/Covariance matrix need to bereplaced with

$C_{jk}^{*} = \left\{ {{\begin{matrix}{{{{Var}\left( {\lambda_{j}\beta_{j}} \right)} = {\lambda_{j}^{2}{{Var}\left( \beta_{j} \right)}}},} & {j = k} \\{{{{Cov}\left( {{\lambda_{j}\beta_{j}},{\lambda_{k}\beta_{k}}} \right)} = {\lambda_{j}\lambda_{k}{{Cov}\left( {\beta_{j},\beta_{k}} \right)}}},} & {j \neq k}\end{matrix}{where}\lambda_{j}} = \frac{\beta_{j}^{*}}{\beta_{j}}} \right.$

7.3.5 From this a scaled version of the Hat diagonal can be calculated.

$h_{i}^{*} = {{\sum\limits_{jk}\; {X_{ij}C_{jk}^{*}X_{ik}W_{i}}} = {\sum\limits_{jk}\; \frac{X_{ij}\beta_{j}^{*}C_{jk}X_{ik}\beta_{k}^{*}W_{i}}{\beta_{j}\beta_{k}}}}$

which produces new Linear Predictors

$\eta_{(i)}^{*} = {\eta_{i}^{*} - {\left( \frac{h_{i}^{*}}{1 - h_{i}^{*}} \right){g^{\prime}\left( \mu_{i} \right)}\left( {y_{i} - \mu_{i}} \right)}}$

and “Case Deleted” Estimates μ*_((i))=g⁻¹(η*_((i)))

7.4 Idea of a Model Depreciation Index

7.4.1 To draw an analogy, the value of a model is like that of a usedcar. The instant it rolls off the forecourt it looses a chunk of itspredictive power simply by virtue of the fact that it is now being usedon new data rather than measured in a circular fashion against the dataused to define it.

7.4.2 As time passes, the value decreases further, as was illustrated bya model validation working party organised as part of the Institute ofActuaries Giro Conference 2009. FIG. 10 is an extract from page 10 oftheir report (Berry J, et al).

7.4.3 The Noise Reduction technique provides an indication of thatinitial depreciation, by reference to the scale factors which have beenderived.

7.4.4 Without applying the scale factors, deploying the full model,would result in a worse model than the scaled one.

8. Calculation of Noise Reduced Model

8.1 Specification of Penalty Function and Two Tier Modelling Process

8.1.1 First obtain the results of the normal Generalized Linear Modelfit, as outlined in A.1.10. Next calculate the “Case Deleted” Estimates

$\eta_{(i)} = {\eta_{i} - {\left( \frac{h_{i}}{1 - h_{i}} \right){g^{\prime}\left( \mu_{i} \right)}{\left( {y_{i} - \mu_{i}} \right).}}}$

Now using the superscript * to denote new parameters and estimatesβ*_(j), η*_(i), μ*_(i) which we will estimate from the new penaltyfunction.

8.1.2 The Hat diagonal h_(i) is a measure of the influence attaching tothe data point y_(i) with (1−h_(i)) the influence of the remainingpoints. This includes the effect of the Variance of parameter β_(j),Var(β_(j))and the Covariance of this with the other parametersCov(β_(j),β_(k)). Now suppose that β_(j) is scaled back to a valueβ*_(j), this will reduce the variance to

${{Var}\left( \beta_{j}^{*} \right)} = {\left( \frac{\beta_{j}^{*}}{\beta_{j}} \right)^{2}{{Var}\left( \beta_{j} \right)}}$

and the Covariances to

$C_{jk}^{*} = {{{Cov}\left( {\beta_{j}^{*},\beta_{k}^{*}} \right)} = {\left( {\frac{\beta_{j}^{*}}{\beta_{j}}\frac{\beta_{k}^{*}}{\beta_{k}}} \right){{{Cov}\left( {\beta_{j},\beta_{k}} \right)}.}}}$

These are not the same as the variance results that would occur from amodel which had generated these parameter values directly. Using thesevalues we can scale back the “Case Deleted” Estimates that would applyto the new parameters.

$\eta_{(i)}^{*} = {\eta_{i}^{*} - {\left( \frac{h_{i}^{*}}{1 - h_{i}^{*}} \right){g^{\prime}\left( \mu_{i} \right)}\left( {y_{i} - \mu_{i}} \right)\mspace{14mu} {where}}}$$h_{i}^{*} = {\sum\limits_{jk}\; \frac{X_{ij}\beta_{j}^{*}C_{jk}X_{ik}\beta_{k}^{*}W_{i}}{\beta_{j}\beta_{k}}}$

8.2 Non-linear model algorithm

8.2.1 Now using the results developed in Appendix B with the newdefinition of F*_(i) being

$F_{i}^{*} = {\eta_{(i)}^{*} = {\eta_{i}^{*} - {\left( \frac{h_{i}^{*}}{1 - h_{i}^{*}} \right){g^{\prime}\left( \mu_{i} \right)}\left( {y_{i} - \mu_{i}} \right)}}}$

8.2.2 Notation will gain super and subscripts as required giving a newobjective as

$l_{(i)}^{*} = {{l\left( {y_{i},\theta_{(i)}^{*}} \right)} = {{\sum\limits_{i}\; {\frac{\omega_{i}}{\phi}\left( {{y_{i}\theta_{(i)}^{*}} - {a\left( \theta_{(i)}^{*} \right)}} \right)}} + {b\left( {y_{i},\phi} \right)}}}$

8.2.3 Score Statistic of

$U_{j}^{*} = {\frac{\partial l_{(i)}^{*}}{\partial\beta_{j}^{*}} = {\sum\limits_{i}\; {\frac{\partial l_{(i)}^{*}}{\partial\theta_{(i)}^{*}}\frac{\partial\theta_{(i)}^{*}}{\partial\mu_{(i)}^{*}}\frac{\partial\mu_{(i)}^{*}}{\partial\eta_{(i)}^{*}}\frac{\partial\eta_{(i)}^{*}}{\partial\beta_{j}^{*}}\mspace{14mu} {with}}}}$${\frac{\partial l_{(i)}^{*}}{\partial\theta_{(i)}^{*}} = {\sum\limits_{i}\; {\frac{\omega_{i}}{\phi}\left( {y_{i} - \mu_{(i)}^{*}} \right)}}},{\frac{\partial\mu_{(i)}^{*}}{\partial\theta_{(i)}^{*}} = {{a^{''}\left( \theta_{(i)}^{*} \right)} = {V\left( \mu_{(i)}^{*} \right)}}},{\frac{\partial\eta_{(i)}^{*}}{\partial\mu_{(i)}^{*}} = {g^{\prime}\left( \mu_{(i)}^{*} \right)}}$

8.2.4 Calculating

$F_{ij}^{\prime*} = {\frac{\partial\eta_{(i)}^{*}}{\partial\beta_{j}^{*}} = {X_{ij} - {\left( \frac{H_{ij}^{\prime*}}{\left( {1 - h_{i}^{*}} \right)^{2}} \right){g^{\prime}\left( \mu_{i} \right)}\left( {y_{i} - \mu_{i}} \right)\mspace{14mu} {with}}}}$$H_{ij}^{\prime*} = {\sum\limits_{k}\; \frac{2\; X_{ij}C_{jk}X_{ik}\beta_{k}^{*}W_{i}}{\beta_{j}\beta_{k}}}$

8.2.5 From B.1.3 we have

$\mspace{20mu} {U_{j}^{*} = {\frac{\partial l_{(i)}^{*}}{\partial\beta_{j}^{*}} = {{\sum\limits_{i}\; \frac{\omega_{i}{F_{ij}^{\prime*}\left( {y_{i} - \mu_{(i)}^{*}} \right)}}{{{\phi g}^{\prime}\left( \mu_{(i)}^{*} \right)}{V\left( \mu_{(i)}^{*} \right)}}} = {\sum\limits_{i}\; {F_{ij}^{\prime*}W_{(i)}^{*}{g^{\prime}\left( \mu_{(i)}^{*} \right)}\left( {y_{i} - \mu_{(i)}^{*}} \right)}}}}}$  where$\mspace{20mu} {W_{(i)}^{*} = \frac{\omega_{i}}{{\phi \left( {g^{\prime}\left( \mu_{(i)}^{*} \right)} \right)}^{2}{V\left( \mu_{(i)}^{*} \right)}}}$$U_{jk}^{\prime*} = {\frac{\partial U_{j}^{*}}{\partial\beta_{k}} = {{\sum\limits_{i}\; {\left( {{F_{ijk}^{''*}W_{(i)}^{*}{g^{\prime}\left( \mu_{(i)}^{*} \right)}} + {F_{ij}^{\prime*}\frac{\partial\left( {W_{(i)}^{*}{g^{\prime}\left( \mu_{(i)}^{*} \right)}} \right)}{\partial\beta_{k}}}} \right)\left( {y_{i} - \mu_{(i)}^{*}} \right)}} - {F_{ij}^{\prime*}W_{(i)}^{*}F_{ik}^{\prime*}}}}$  where$\mspace{20mu} {F_{ijk}^{''*} = {{{- \left( \frac{H_{ijk}^{''*}}{\left( {1 - h_{i}^{*}} \right)^{2}} \right)}{g^{\prime}\left( u_{i} \right)}\left( {y_{i} - \mu_{i}} \right)} - {2\left( \frac{H_{ij}^{\prime*}H_{ik}^{\prime*}}{\left( {1 - h_{i}^{*}} \right)^{3}} \right){g^{\prime}\left( u_{i} \right)}\left( {y_{i} - \mu_{i}} \right)}}}$$\mspace{20mu} {{{and}\mspace{14mu} H_{ijk}^{''*}} = \frac{2\; X_{ij}C_{jk}X_{ik}W_{i}}{\beta_{j}\beta_{k}}}$

8.2.7 In this case the matrix U′*_(jk) has not been decomposed intoeigenvectors.

${{}_{}^{m + 1}{}_{}^{}} = {{{}_{}^{}{}_{}^{}} + {\sum\limits_{ik}\; {\left( U_{jk}^{\prime*} \right)^{- 1}F_{ij}^{\prime*}{{}_{}^{}{}_{(i)}^{}}{g^{\prime}\left( {{}_{}^{}{}_{(i)}^{}} \right)}\left( {y_{i} - {{}_{}^{}{}_{(i)}^{}}} \right)}}}$

9. Worked Example

9.1 Log Poisson Frequency Model

9.1.1 This example is taken from a Motor Third Party Bodily Injuryexample dataset. This model has a large sample size of 500,000 witharound 30,000 responses.

9.1.2 A full complexity model was built upon the data, using 31 factorswith 54 parameters, of which 8 were interactions.

9.1.3 FIG. 11 shows the relationship between the Standard Error (x-axis)reported by the GLM and the Scale Factor (y-axis) recommended by theNoise Reduction technique.

9.1.4 A few of parameters were retained beyond the normal acceptancethreshold, to show the fall-off between higher errors and the scalefactor.

9.1.5 FIG. 12 shows the ratio of the two models (x-axis), and theaverage observed response and model prediction values (y-axis), plus theexposure as bars (2^(nd) y-axis). Models here have been fitted on thetraining dataset, and then rescored against the hold-out dataset Thechart then measures their value against observed data from the hold-outdataset.

9.1.6 The models show varying predictions with a ratio substantiallybetween +/−5%. The noise reduced model produces predictions which arescaled towards the mean, which temper the predictions made by the GLM atthe extremes of the distribution.

9.1.7 Using a simple business model with a price comparison websitelevel of elasticity fixed at 10 shows a profit margin improvement inthis example of 0.57% at constant volumes.

9.2 Log Gamma Severity Model

9.2.1 This example (see FIGS. 13 and 14) is taken from a MotorAccidental Damage Severity example dataset. To contrast with theprevious frequency model, a sample size of 12,000 was used with anaverage response of 1,450.

9.2.2 A full complexity model was built upon the data, using 18 factorswith 59 parameters, of which 17 were interactions.

9.2.3 Using a simple business model with a price comparison websitelevel of elasticity fixed at 10 shows a profit margin improvement inthis example of 0.69% at constant volumes.

9.3 Logit Binomial Proportion of Collisions with Bodily Injury Model

9.3.1 This example (see FIGS. 15 and 16) is a propensity model built ona Motor dataset using collision as the exposure measure, and proportionof Bodily Injuries on the claim as the response. Such an approach issometimes used to increase the patterns detected in sparse Bodily Injurydata. The sample size was 22,000.

9.3.2 The model is using 19 factors with 108 parameters with nointeractions.

9.3.3 Using a simple business model with a price comparison websitelevel of elasticity fixed at 10 shows a large profit margin improvementin this example of 3.4% at constant volumes.

9.4 Poor Model

9.4.1 In this example a particularly poor set of parameters was retainedto find out how effectively the technique was at removing ones that arenot significant. FIG. 17 shows that scale factors quite close to zeroare achieved. The resultant model however was still very poor, as thetechnique does nothing to add significant factors which are missing fromthe original model.

10. Examples of Practical Applications for the Present Methods

Examples of the types of model where the present techniques areapplicable to provide more accurately predictive models.

Claim frequency—This type of model will use physical characteristics ofthe insured object such as type of vehicle, engine power, age of vehicleand so on, and using these will determine how many of them will crash ina given year.

This knowledge is useful not just for the purposes of setting theinsurance premium itself, but also guides the capacity of repair garagesused to repair the cars.

Claims cost—This model is a similar concept to claims frequency, exceptthat the purpose is to determine the amount of damage per vehicle. Valueof vehicle and cost of repair parts will be additional factors in thismodel.

Again the values can be used to define insurance costs, but in additionthe amount of damage relative to the vehicle value is a key determiningfact in deciding if a damaged vehicle should be repaired or scrapped.

Propensity—These types of model create a likelihood of an eventoccurring. There are many types that might be produced, and hence a widevariety of applications, including insurance and a wide range of otherscenarios.

For example a propensity model may be developed to determine if a personwill respond to a piece of mail. This multiplied by the value of theresponse, gives the benefit of mailing a person, and can be compared tothe cost of the operation. Increasing use of this technique results infewer blanket junk-mail activities, and better targeted mail towardsthose who want to receive it.

Another example is generating a model to predict the chance that aperson will renew a product this year. This may be used to target pricediscounts and other rewards to the undecided customers.

A further example is generating a model to determine the chance that apost-operative patient discharged today will need to be readmitted laterwith complications. Hospitals are under increased pressure to dischargepatients early to free up beds, and for some patients this is beneficialas they will recover better at home with family. However for others(depending on the operation type, and patient age and history forexample), early discharge could result in relapse, and longer moreexpensive care later. Hence the ability to weight up all the influencingfactors to make the best decision can improve care and aid difficultdecisions around the allocation of resources.

In all of these examples the model is used to provide information wherethe outcome depends on several (possibly very many) factors. The presentmethods provide an adjustment to the results which make them morepredictive of future outcomes, by insulating them from noise in thevisible data.

This model information is used to make a real-world choices about theallocation of resources, such as whether to fix a car or scrap it, tomail a person or leave them in peace, to discharge a patient or keepthem in hospital.

Before turning to the process flow diagrams of FIGS. 18 and 19, it isnoted that embodiments described herein may be practiced using analternative order of the steps illustrated in FIGS. 18 and 19. That is,the process flows illustrated in FIGS. 18 and 19 are provided asexamples only, and the embodiments may be practiced using process flowsthat differ from those illustrated. Additionally, it is noted that notall steps are required in every embodiment. In other words, one or moreof the steps may be omitted or replaced, without departing from thespirit and scope of the embodiments. Further, steps may be performed indifferent orders, in parallel with one another, or omitted entirely,and/or certain additional steps may be performed without departing fromthe scope and spirit of the embodiments.

Turning to FIG. 18, an example embodiment of a method 1800 for analysinginput data using a processor is described. In certain embodiments, theinput data comprises, for each set of physical entities, attributevalues representing attributes of the respective physical entity and anoutcome value representing an observed outcome for the respectivephysical entity. The analysis generates a model for predicting anoutcome value for a further physical entity based on input datacomprising attribute values associated with the further physical entity.

In FIG. 18, the method 1800 includes receiving input data and storingthe input data in an electronic data storage at step 1810. The method1800 further includes retrieving the input data from the data storageand processing the input data using a statistical modelling method togenerate a model based on the input data at step 1820. In variousembodiments, the model generated in step 1820 may comprise any of themodels described above or combinations thereof. For example, asdiscussed above, the statistical modelling method may generate the modelas a Generalised Linear Model or a Generalised Non-linear Model, amongembodiments.

Proceeding to step 1830, the method 1800 further includes calculating acase deleted estimate of the outcome value for each set of physicalentities. For example, calculating case deleted estimates at step 1830may be performed according to the methods, models, and calculationsdescribed above in sections 2 and 3. At step 1840, the method 1800proceeds to calculating a measure of deviance of the case deletedestimates from the outcome values of the input data. At step 850, themethod includes outputting the measure of deviance to data storage forretrieval by a user.

In certain other embodiments, the method 1800 may further includecalculating a number and location of knots to include in the model tominimise the measure of deviance at step 1860. For example, calculatingknots at step 1860 may be performed according to the methods, models,and calculations described above. Additionally or alternatively, themethod 1800 may further include identifying at least one attribute toomit from the model based on an associated deviance measure of the atleast one attribute and removing the at least one attribute from themodel at step 1870, according to the methods, models, and calculationsdescribed above.

Turning to FIG. 19, an example embodiment of a method 1900 for analysinginput data using a processor is described. In certain embodiments, theinput data comprises, for each set of physical entities, attributevalues representing attributes of the respective physical entity and anoutcome value representing an observed outcome for the respectivephysical entity. The analysis generates a model for predicting anoutcome value for a further physical entity based on input datacomprising attribute values associated with the further physical entity.

In FIG. 19, the method 1900 includes receiving input data and storingthe input data in an electronic data storage at step 1910. The method1900 further includes retrieving the input data from the data storageand processing the input data using a statistical modelling method togenerate an intermediate model based on the input data, the intermediatemodel comprising parameter estimates and a variance/covariance matrix atstep 1920. In various embodiments, the intermediate model generated instep 1920 may comprise any of the models described above or combinationsthereof. For example, as discussed above, the statistical modellingmethod may generate the model as a Generalised Linear Model or aGeneralised Non-linear Model, among embodiments.

Proceeding to step 1930, the method 1900 further includes calculating acase deleted estimate of the outcome value for each set of physicalentities based on the intermediate model. For example, calculating casedeleted estimates at step 1930 may be performed according to themethods, models, and calculations described above in sections 2 and 3.

In one embodiment, calculating the case deleted estimates at step 1930includes calculating, for each entity, the case deleted estimatedirectly and without running the intermediate model for the entity onthe basis of the input data with the data associated with the entityomitted. In another embodiment, case deleted estimates may be calculatedat step 1930 directly for each entity by calculating case deleted linearpredictors and deriving the case deleted estimates therefrom using aninverse link function. In this embodiment, the case deleted linearpredictor is calculated for each entity by adjusting a linear predictorprovided by the intermediate model by subtracting an amountcorresponding to an influence on the model caused by the outcome valuefor the entity, as also described above, wherein the influence on themodel caused by the outcome value for an entity is calculated bymultiplying a distance from the model to the respective outcome value byan influence factor and by a rate of change of the linear predictor bythe estimate. In still another embodiment, calculating the case deletedestimates at step 1930 includes calculating, for each entity, a casedeleted estimate by running the intermediate model based on the inputdata with the attribute values associated with the entity omitted togenerate a respective set of case deleted model parameters.

At step 1940, the method 1900 proceeds to generating a noise reducedmodel comprising noise reduced parameters, a noise reducedvariance/covariance matrix, and noise reduced case deleted estimatesusing an iterative process to minimise a measure of deviance of thenoise reduced case deleted estimates from the outcome values of theinput data. The generation of the noise reduced model may, in variousembodiments, be performed according to the methods, models, andcalculations described above.

In certain other embodiments, the method 1900 may further includecalculating a number and location of knots to include in the noisereduced model to minimise the measure of deviance at step 1950. Forexample, calculating knots at step 1950 may be performed according tothe methods, models, and calculations described above. Additionally oralternatively, in other embodiments, the method 1900 may further includeidentifying at least one attribute to omit from the model based on ameasure of deviance of the at least one attribute relative to the noisereduced model and removing the at least one attribute from the model atstep 1960, according to the methods, models, and calculations describedabove.

Turning to FIG. 20, an example hardware circuit diagram of a generalpurpose computing device 2000 is described. The computing device 2000includes a processor 2010 and a data storage 2020. In variousembodiments, the processor 2010 comprises any well known general purposearithmetic processor, for example. The data storage 2020 comprises anywell known memory device or tangible computer-readable medium thatstores computer-readable instructions to be executed by the processor2010. The data storage 2020 stores computer-readable instructionsthereon that, when executed by the processor 2010, direct the processor2010 to execute various aspects of the present invention describedherein, such as the methods 1800 and 1900 described above, for example.In operation, the processor 2010 is configured to retrievecomputer-readable instructions stored on the data storage 2020 andexecute the computer-readable instructions to implement various aspectsand features of the present invention. For example, the processor 2010may be adapted and configured to execute the processes described abovewith reference to FIGS. 18 and 19.

APPENDICES Appendix A. Generalized Linear Models

A.1 Derivation and Notation

A.1.1 The following derivation is drawn from Anderson et al. and Dobsonand although well known is included so that the non-linear variant canbe derived using the same notation in the main body of the text.

A.1.2 Let Y_(i) be a series of random variables belonging to theexponential family of distributions, expressed in canonical form withnatural parameter θ_(i) by the pdf.

${f\left( {y_{i},\theta_{i}} \right)} = {\exp \left( {{\frac{\omega_{i}}{\phi}\left( {{y_{i}\theta_{i}} - {a\left( \theta_{i} \right)}} \right)} + {b\left( {y_{i},\phi} \right)}} \right)}$

where ω_(i) is a constant related to Y_(i) representing the weight whichis commonly the exposure for insurance applications, and φ is the scaleparameter

A.1.3 Given

∫f(y_(i), θ_(i))y_(i) = 1  we  have${\int{\frac{\partial}{\partial\theta_{i}}{f\left( {y_{i},\theta_{i}} \right)}}} = {0 = {\int{\frac{\omega_{i}}{\phi}\left( {y_{i} - {a^{\prime}\left( \theta_{i} \right)}} \right){f\left( {y_{i},\theta_{i}} \right)}\mspace{14mu} {and}}}}$${\int{\frac{\partial^{2}}{\partial\theta_{i}^{2}}{f\left( {y_{i},\theta_{i}} \right)}}} = {0 = {\int{\left\lbrack {{\frac{\omega_{i}}{\phi}\left( {- {a^{''}\left( \theta_{i} \right)}} \right)} + \left( {\frac{\omega_{i}}{\phi}\left( {y_{i} - {a^{\prime}\left( \theta_{i} \right)}} \right)} \right)^{2}} \right\rbrack {f\left( {y_{i},\theta_{i}} \right)}}}}$

A.1.4 The first of these gives E[Y_(i)]=α′(θ_(i)) and substituting thisinto the second gives

${a^{''}\left( \theta_{i} \right)} = {{\frac{\omega_{i}}{\phi}{E\left\lbrack \left( {Y_{i} - {E\left\lbrack Y_{i} \right\rbrack}} \right)^{2} \right\rbrack}} = {\frac{\omega_{i}}{\phi}{{Var}\left\lbrack Y_{i} \right\rbrack}}}$we  define μ_(i) = E[Y_(i)] = a^(′)(θ_(i))  and${V\left( \mu_{i} \right)} = {{a^{''}\left( \theta_{i} \right)} = {{a^{''}\left( {a^{\prime - 1}\left( \mu_{i} \right)} \right)} = {\frac{\omega_{i}}{\phi}{{Var}\left\lbrack Y_{i} \right\rbrack}}}}$

A.1.5 Let the log likelihood function be denoted by

${l\left( {y_{i},\theta_{i}} \right)} = {{\sum\limits_{i}\; {\frac{\omega_{i}}{\phi}\left( {{y_{i}\theta_{i}} - {a\left( \theta_{i} \right)}} \right)}} + {b\left( {y_{i},\phi} \right)}}$

A.1.6 Further define the linear predictor and the link function for themodel η_(i)=g(μ_(i)), where the linear predictor is a linear combinationof the parameters

$\eta_{i} = {\sum\limits_{j}\; {X_{ij}{\beta_{j}.}}}$

A.1.7 First we define the score statistic

$U_{j} = \frac{\partial l}{\partial\beta_{j}}$

and obtain the result by deriving each of the following terms in order:

${U_{j} = {\sum\limits_{i}{\frac{\partial l_{i}}{\partial\theta_{i}}\frac{\partial\theta_{i}}{\partial\mu_{i}}\frac{\partial\mu_{i}}{\partial\eta_{i}}\frac{\partial\eta_{i}}{\partial\beta_{j}}}}},{\frac{\partial l_{i}}{\partial\theta_{i}} = {{\frac{\omega_{i}}{\phi}\left( {y_{i} - {a^{\prime}\left( \theta_{i} \right)}} \right)} = {\frac{\omega_{i}}{\phi}\left( {y_{i} - \mu_{i}} \right)}}},{\frac{\partial\mu_{i}}{\partial\theta_{i}} = {{a^{''}\left( \theta_{i} \right)} = {V\left( \mu_{i} \right)}}},{\frac{\partial\eta_{i}}{\partial\mu_{i}} = {g^{\prime}\left( \mu_{i} \right)}},{\frac{\partial\eta_{i}}{\partial\beta_{j}} = {X_{ij}.}}$

Giving

$\begin{matrix}{U_{j} = \frac{\partial l}{\partial\beta_{j}}} \\{= {\sum\limits_{i}\frac{\omega_{i}{X_{ij}\left( {y_{i} - \mu_{i}} \right)}}{\phi \; {g^{\prime}\left( \mu_{i} \right)}{V\left( \mu_{i} \right)}}}} \\{= {\sum\limits_{i}{W_{i}{g^{\prime}\left( \mu_{i} \right)}{X_{ij}\left( {y_{i} - \mu_{i}} \right)}}}}\end{matrix}$${{where}\mspace{14mu} W_{i}} = \frac{\omega_{i}}{{\phi \left( {g^{\prime}\left( \mu_{i} \right)} \right)}^{2}{V\left( \mu_{i} \right)}}$

for reasons that will become clearer below. Note also that

$\begin{matrix}{{E\left\lbrack U_{j} \right\rbrack} = {E\left\lbrack {\sum\limits_{i}{W_{i}{g^{\prime}\left( \mu_{i} \right)}{X_{ij}\left( {y_{i} - \mu_{i}} \right)}}} \right\rbrack}} \\{= {\sum\limits_{i}{W_{i}{g^{\prime}\left( \mu_{i} \right)}{X_{ij}\left( {{E\left\lbrack Y_{i} \right\rbrack} - \mu_{i}} \right)}}}} \\{= 0}\end{matrix}$

A.1.8 Next Dobson derives an approximation by first defining theinformation matrix

J_(jk) = Cov(U_(j), U_(k)), and  using  E⌊U_(j)⌋ = 0$\begin{matrix}{J_{jk} = {E\left\lbrack {\left( {U_{j} - {E\left\lbrack U_{j} \right\rbrack}} \right)\left( {U_{k} - {E\left\lbrack U_{k} \right\rbrack}} \right)} \right\rbrack}} \\{= {E\left\lbrack {U_{j}U_{k}} \right\rbrack}} \\{= {\sum\limits_{i}{\left( \frac{\omega_{i}}{\phi \; {g^{\prime}\left( \mu_{i} \right)}{V\left( \mu_{i} \right)}} \right)^{2}X_{ij}X_{ik}{E\left\lbrack \left( {Y_{i} - \mu_{i}} \right)^{2} \right\rbrack}}}}\end{matrix}$ $\begin{matrix}{J_{jk} = {\sum\limits_{i}{\left( \frac{\omega_{i}}{\phi \; {g^{\prime}\left( \mu_{i} \right)}{V\left( \mu_{i} \right)}} \right)^{2}X_{ij}X_{ik}{{Var}\left\lbrack Y_{i} \right\rbrack}}}} \\{= {\sum\limits_{i}{\frac{\omega_{i}}{\phi \; \left( {g^{\prime}\left( \mu_{i} \right)} \right)^{2}{V\left( \mu_{i} \right)}}X_{ij}X_{ik}}}} \\{= {\sum\limits_{i}{X_{ij}W_{i}X_{ik}}}}\end{matrix}$

A.1.9 To solve for the parameters in the general case we use anextension of the Newton Raphson formula

$\;^{m + 1}\beta_{j} =^{m}{\beta_{j}\mspace{11mu} - \; {\sum\limits_{k}{{\left( {}^{m}U_{jk}^{\prime} \right)^{- 1}}^{m}U_{k}}}}$

to find the root of

$\begin{matrix}{{\sum\limits_{j}U_{j}} = {0U_{jk}^{\prime}}} \\{= \frac{\partial U_{j}}{\partial\beta_{k}}} \\{= {{\sum\limits_{i}{\frac{\partial\left( {W_{i}{g^{\prime}\left( \mu_{i} \right)}} \right)}{\partial\beta_{k}}{X_{ij}\left( {y_{i} - \mu_{i}} \right)}}} +}} \\{{W_{i}{g^{\prime}\left( \mu_{i} \right)}{{X_{ij}\left( {\frac{\partial\left( {y_{i} - \mu_{i}} \right)}{\partial\eta_{i}}\frac{\partial\eta_{i}}{\partial\beta_{k}}} \right)}.}}}\end{matrix}$

the stationary point we are seeking

$\sum\limits_{i}\; {\omega_{i}{X_{ij}\left( {y_{i} - \mu_{i}} \right)}}$

will be close to zero. For the structures noted in 5.2.3 this will beexactly zero, and

${{{g^{\prime}\left( \mu_{i} \right)}{V\left( \mu_{i} \right)}} = 1},{{{giving}\mspace{14mu} \frac{\partial\left( {W_{i}{g^{\prime}\left( \mu_{i} \right)}} \right)}{\partial\beta_{k}}} = 0.}$

Hence the first term is normally ignored.

$\begin{matrix}{U_{jk}^{\prime} = \frac{\partial U_{j}}{\partial\beta_{k}}} \\{= {\sum\limits_{i}{{- W_{i}}{g^{\prime}\left( \mu_{i} \right)}{X_{ij}\left( {\frac{\partial\mu_{i}}{\partial\eta_{i}}\frac{\partial\eta_{i}}{\partial\beta_{k}}} \right)}}}} \\{= {\sum\limits_{i}{{- X_{ij}}W_{i}X_{ik}}}} \\{= {- J_{jk}}}\end{matrix}$

A.1.10 Then we obtain the usual formula for iteration m, where

$\;^{m + 1}{\hat{\beta}}_{j} =^{m}{{\hat{\beta}}_{j} + {\sum\limits_{ik}{\left( {\sum\limits_{p}\; {X_{pj}^{m}W_{p}X_{p\; k}}} \right)^{- 1}X_{i\; k}^{m}W_{i}{g^{\prime}\left( {}^{m}\mu_{i} \right)}\left( {y_{i} -^{m}\mu_{i}} \right)}}}$

sometimes written as

$\;^{m + 1}{\hat{\beta}}_{j} = {\sum\limits_{ik}{\left( {\sum\limits_{p}\; {X_{pj}^{m}W_{p}X_{p\; k}}} \right)^{- 1}X_{i\; k}^{m}{W_{i}^{m}\left( {\eta_{i} + {{g^{\prime}\left( {}^{m}\mu_{i} \right)}\left( {y_{i} -^{m}\mu_{i}} \right)}} \right)}}}$

A.1.11 From these results the Variance-Covariance matrix is available

$C_{jk} = \left( {\sum\limits_{p}\; {X_{pj}^{m}W_{p}X_{p\; k}}} \right)^{- 1}$

along with the Hat Matrix

$H_{ip} = {\sum\limits_{jk}\; {W_{i}^{1/2}X_{ij}C_{jk}X_{p\; k}W_{p}^{1/2}}}$

and the Hat diagonal h_(i)=H_(ii)

Appendix B. Non-Linear Model Algorithm

B.1 Derivation and Validity

B.1.1 Taking the standard Generalized Linear Model form

$\eta_{i} = {{g\left( \mu_{i} \right)} = {\sum\limits_{j}\; {X_{ij}\beta_{j}}}}$

and adding an extra function to represent a non-linear variant of themodel. So that now η_(i)=g(μ_(i))=F(X_(ij),β_(j)) denoted F_(i).

B.1.2 Let the log likelihood function and score statistics be the sameas those of Appendix A above.

${l\left( {y_{i},\theta_{i}} \right)} = {{\sum\limits_{i}{\frac{\omega_{i}}{\phi}\left( {{y_{i}\theta_{i}} - {a\left( \theta_{i} \right)}} \right)}} + {{b\left( {y_{i},\phi} \right)}\mspace{14mu} {and}}}$$U_{j} = {\frac{\partial l}{\partial\beta_{j}} = {\sum\limits_{i}{\frac{\partial l_{i}}{\partial\theta_{i}}\frac{\partial\theta_{i}}{\partial\mu_{i}}\frac{\partial\mu_{i}}{\partial\eta_{i}}\frac{\partial\eta_{i}}{\partial\beta_{j}}}}}$

B.1.3 The first three terms are the same as Appendix A

${\frac{\partial l_{i}}{\partial\theta_{i}} = {{\frac{\omega_{i}}{\phi}\left( {y_{i} - {a^{\prime}\left( \theta_{i} \right)}} \right)} = {\frac{\omega_{i}}{\phi}\left( {y_{i} - \mu_{i}} \right)}}},{\frac{\partial\mu_{i}}{\partial\theta_{i}} = {{a^{''}\left( \theta_{i} \right)} = {V\left( \mu_{i} \right)}}},{\frac{\partial\eta_{i}}{{\partial\mu_{i}}\;} = {g^{\prime}\left( \mu_{i} \right)}}$

and the final term now becomes

$\frac{\partial\eta_{i}}{\partial\beta_{j}} = {F^{\prime}\left( {X_{ij},\beta_{j}} \right)}$

denoted F′_(ij). Giving

$U_{j} = {\frac{\partial l}{\partial\beta_{j}} = {{\sum\limits_{i}{F_{ij}^{\prime}W_{i}{g^{\prime}\left( \mu_{i} \right)}\left( {y_{i} - \mu_{i}} \right)\mspace{14mu} {with}\mspace{14mu} W_{i}}} = \frac{\omega_{i}}{{\phi \left( {g^{\prime}\left( \mu_{i} \right)} \right)}^{2}{V\left( \mu_{i} \right)}}}}$

as before.

B.1.4 At this point it is tempting to jump to the information matrix

$J_{jk} = {\sum\limits_{i}{F_{ij}^{\prime}W_{i}F_{ik}^{\prime}}}$

making use of the same approximation discussed in A.1.9 U′_(jk)=−J_(jk).Which would define the iteration as

${{}_{}^{m + 1}\left. \beta \right.\hat{}_{}^{}} = {{\,{{}_{}^{}\left. \beta \right.\hat{}_{}^{}}} + {\sum\limits_{ik}{\left( {\sum\limits_{p}{F_{pj}^{\prime}{\,{{}_{}^{}{}_{}^{}}}F_{p\; k}^{\prime}}} \right)^{- 1}F_{ij}^{\prime}{{}_{}^{}{}_{}^{}}{g^{\prime}\left( {{}_{}^{}{}_{}^{}} \right)}{\left( {y_{i} - {{}_{}^{}{}_{}^{}}} \right).}}}}$

B.1.5 However first we must calculate

$U_{jk}^{\prime} = {\frac{\partial U_{j}}{\partial\beta_{k}} = {{{\sum\limits_{i}{\left( {{F_{ijk}^{''}W_{i}{g^{\prime}\left( \mu_{i} \right)}} + {F_{ij}^{\prime}\frac{\partial\left( {W_{i}{g^{\prime}\left( \mu_{i} \right)}} \right)}{\partial\beta_{k}}}} \right)\left( {y_{i} - \mu_{i}} \right)}} + {F_{ij}^{\prime}W_{i}{g^{\prime}\left( {{- \frac{\partial\mu_{i}}{\partial\eta_{i}}}\frac{\partial\eta_{i}}{\partial\beta_{k}}} \right)}}} = {{\sum\limits_{i}{\left( {{F_{ijk}^{''}W_{i}{g^{\prime}\left( \mu_{i} \right)}} + {F_{ij}^{\prime}\frac{\partial\left( {W_{i}{g^{\prime}\left( \mu_{i} \right)}} \right)}{\partial\beta_{k}}}} \right)\left( {y_{i} - \mu_{i}} \right)}} - {F_{ij}^{\prime}W_{i}{F_{ik}^{\prime}.}}}}}$

Note that the second of these terms represents the usual linear formula

${- J_{jk}} = {- {\sum\limits_{i}{F_{ij}^{\prime}W_{i}{F_{ik}^{\prime}.}}}}$

B.1.6 For the non-linear case we notice that the formula now involves anextra term in F″_(ijk) (which was zero in the linear case

$\left. {F_{ijk}^{''} = {\frac{\partial F_{ij}^{\prime}}{\partial\beta_{k}} = {\frac{\partial X_{ij}}{\partial\beta_{k}} = 0}}} \right).$

Therefore the approximation will be less likely to be sufficiently closealong the path we are seeking towards the stationary solution, and thismay disrupt the convergence.

B.1.7 In general therefore we must instead fall back to the formula

${{}_{}^{m + 1}{}_{}^{}} = {{{}_{}^{}{}_{}^{}} - {\sum\limits_{ik}{\left( U_{jk}^{\prime} \right)^{- 1}F_{ij}^{\prime}W_{i}{g^{\prime}\left( {{}_{}^{}{}_{}^{}} \right)}\left( {y_{i} - {{}_{}^{}{}_{}^{}}} \right)}}}$

which will display superior convergence characteristics for a widerrange of link function and distribution structures.

B.1.8 Numerical testing has shown cases where B.1.4 diverges rapidly,and B.1.7 converges almost as efficiently as the equivalent linear case.

1. A method for analysing input data using a processor, wherein theinput data comprises, for each set of physical entities, attributevalues representing attributes of the respective physical entity and anoutcome value representing an observed outcome for the respectivephysical entity, the analysis generates a model for predicting anoutcome value for a further physical entity based on input datacomprising attribute values associated with the further physical entity,the method comprising: receiving, by the processor, the input data andstoring the input data in an electronic data storage; retrieving, by theprocessor, the input data from the data storage and processing the inputdata using a statistical modelling method to generate the model based onthe input data; calculating, by the processor, a case deleted estimateof the outcome value for each set of physical entities; calculating, bythe processor, a measure of deviance of the case deleted estimates fromthe outcome values of the input data; and outputting, by the processor,the measure of deviance to the data storage for retrieval by a user. 2.The method of claim 1, further comprising calculating a number andlocation of knots to include in the model to minimise the measure ofdeviance.
 3. The method of claim 1, further comprising identifying atleast one attribute to omit from the model based on an associateddeviance measure of the at least one attribute; and removing the atleast one attribute from the model.
 4. A method for analysing input datausing a processor, wherein the input data comprises, for each set ofphysical entities, attribute values representing attributes of therespective physical entity and an outcome value representing an observedoutcome for the respective physical entity, the analysis generates amodel for predicting an outcome value for a further physical entitybased on input data comprising attribute values associated with thefurther physical entity, the method comprising: receiving, by theprocessor, the input data; processing the input data, by the processor,using a statistical modelling method to generate an intermediate modelbased on the input data, the intermediate model comprising parameterestimates and a variance/covariance matrix; calculating, by theprocessor, a case deleted estimate of the outcome value for each set ofphysical entities based on the intermediate model; and generating, bythe processor, a noise reduced model comprising noise reducedparameters, a noise reduced variance/covariance matrix, and noisereduced case deleted estimates using an iterative process to minimise ameasure of deviance of the noise reduced case deleted estimates from theoutcome values of the input data.
 5. The method of claim 4, whereingenerating a noise reduced model further comprises replacing parametersβ_(j) in the noise reduced model by noise reduced parameters β*_(j),with the noise reduced variances${{{Var}\left( \beta_{j}^{*} \right)} = {\left( \frac{\beta_{j}^{*}}{\beta_{j}} \right)^{2}{{Var}\left( \beta_{j} \right)}}},$the noise reduced covariances${{{Cov}\left( {\beta_{j}^{*},\beta_{k}^{*}} \right)} = {\left( {\frac{\beta_{j}^{*}}{\beta_{j}}\frac{\beta_{k}^{*}}{\beta_{k}}} \right){{Cov}\left( {\beta_{j},\beta_{k}} \right)}}},$and the noise reduced case deleted linear predictors$\eta_{(i)}^{*} = {\eta_{i}^{*} - {\left( \frac{h_{i}^{*}}{1 - h_{i}^{*}} \right){g^{\prime}\left( \mu_{i} \right)}\left( {y_{i} - \mu_{i}} \right)\mspace{14mu} {where}}}$$h_{i}^{*} = {\sum\limits_{jk}\frac{X_{ij}\beta_{j}^{*}C_{jk}X_{ik}\beta_{k}^{*}W_{i}}{\beta_{j}\beta_{k}}}$${{and}\mspace{14mu} \frac{\partial\eta_{i}}{\partial\mu_{i}}} = {{g^{\prime}\left( \mu_{i} \right)}.}$6. The method of claim 4, further comprising calculating a number andlocation of knots to include in the noise reduced model to minimise themeasure of deviance.
 7. The method of claim 4, further comprisingidentifying at least one attribute to omit from the noise reduced modelbased on a measure of deviance of the at least one attribute relative tothe noise reduced model; and removing the at least one attribute fromthe noise reduced model.
 8. The method of claim 4, wherein calculating acase deleted estimate comprises, for each entity, calculating the casedeleted estimate directly and without running the intermediate model forthe entity on the basis of the input data with the data associated withthe entity omitted.
 9. The method of claim 4, wherein the case deletedestimates are calculated directly for each entity by calculating casedeleted linear predictors and deriving the case deleted estimatestherefrom using an inverse link function.
 10. The method of claim 9,wherein, for each entity, the case deleted linear predictor iscalculated by adjusting a linear predictor provided by the intermediatemodel by subtracting an amount corresponding to an influence on themodel caused by the outcome value for the entity.
 11. The method ofclaim 10, wherein the influence on the model caused by the outcome valuefor an entity is calculated by multiplying a distance from the model tothe respective outcome value by an influence factor and by a rate ofchange of the linear predictor by the estimate.
 12. The method of claim8, wherein calculating a case deleted estimate comprises calculatingcase deleted linear predictors η_((i)) such that:$\eta_{(i)} = {{\eta_{i} - {\left( \frac{h_{i}}{1 - h_{i}} \right){g^{\prime}\left( \mu_{i} \right)}\left( {y_{i} - \mu_{i}} \right)\mspace{14mu} {where}\mspace{14mu} h_{i}}} = {\sum\limits_{jk}{X_{ij}C_{jk}X_{ik}W_{i}}}}$${{and}\mspace{14mu} \frac{\partial\eta_{i}}{\partial\mu_{i}}} = {{g^{\prime}\left( \mu_{i} \right)}.}$13. The method of claim 4, wherein calculating a case deleted estimatecomprises calculating, for each entity, a case deleted estimate byrunning the intermediate model based on the input data with theattribute values associated with the entity omitted to generate arespective set of case deleted model parameters.
 14. The method of claim4, wherein the statistical modelling method generates a GeneralisedLinear Model.
 15. The method of claim 4, wherein the statisticalmodelling method generates a Generalised Non-linear Model.
 16. Themethod of claim 1, wherein the statistical modelling method generates aGeneralised Linear Model.
 17. The method of claim 1, wherein thestatistical modelling method generates a Generalised Non-linear Model.18. A computer-readable medium that stores computer-readableinstructions thereon that, when executed by a processor, direct theprocessor to perform a method for analysing input data, wherein theinput data comprises, for each set of physical entities, attributevalues representing attributes of the respective physical entity and anoutcome value representing an observed outcome for the respectivephysical entity, the analysis generates a model for predicting anoutcome value for a further physical entity based on input datacomprising attribute values associated with the further physical entity,the method comprising: processing, by the processor, the input datausing a statistical modelling method to generate the model based on theinput data; calculating, by the processor, a case deleted estimate ofthe outcome value for each set of physical entities; calculating, by theprocessor, a measure of deviance of the case deleted estimates from theoutcome values of the input data; and outputting, by the processor, themeasure of deviance.
 19. A computer-readable medium that storescomputer-readable instructions thereon that, when executed by aprocessor, direct the processor to perform a method for analysing inputdata, wherein the input data comprises, for each set of physicalentities, attribute values representing attributes of the respectivephysical entity and an outcome value representing an observed outcomefor the respective physical entity, the analysis generates a model forpredicting an outcome value for a further physical entity based on inputdata comprising attribute values associated with the further physicalentity, the method comprising: processing, by the processor, the inputdata using a statistical modelling method to generate an intermediatemodel based on the input data, the intermediate model comprisingparameter estimates and a variance/covariance matrix; calculating, bythe processor, a case deleted estimate of the outcome value for each setof physical entities based on the intermediate model; and generating, bythe processor, a noise reduced model comprising noise reducedparameters, a noise reduced variance/covariance matrix, and noisereduced case deleted estimates using an iterative process to minimise ameasure of deviance of the noise reduced case deleted estimates from theoutcome values of the input data.